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Abstract. We show that the absolute value of the determinant of a matrix with random indepen- 
dent (but not necessarily iid) entries is strongly concentrated around its mean. 

As an application, we show that Godsil-Gutman and Barvinok estimators for the permanent of 
a strictly positive matrix give sub-exponential approximation ratios with high probability. 

A positive answer to the main conjecture of the paper would lead to polynomial approximation 
ratios in the above problem. 



1. Introduction 

Let A he an n X n square matrix. We denote by det^ and peiA its determinant and permanent, 
respectively, which are defined by 

n n 

deiA = Y^i-iya-- -Q ^.^.^ ^^.a = E 11 

<T i=l (T i=l 

where the sum is taken over ah permutations in Sn and aij denotes the entry of A. 

In this paper, we focus on a random matrix A whose entries are independent (but not necessarily 
iid) random variables with mean zero. The size of A, (which we denote by n) should be thought of 
as tending to infinity and all asymptotic notation will be used under this assumption. 

Our main concern is the following basic question 
Question 1.1. How is \detA\ distributed ? 

A special case is when the entries of A are iid Gaussian (with variance one). In this case, it is 
known that log |detA| satisfies the central limit theorem. 

Theorem 1.2. Let A be the random matrix of size n whose entries are iid Gaussian with variance 
one. Then 

log(|detA|) - llog((n-l)!) 

log n 

converges weakly to the standard Gaussian variable A^(0, 1). 



This statement is easy to verify, as one can write 
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n 

\detA\=Y[d, 

1=1 



where di is the distance from the ith. row vector of A to the subspace spanned by the first i — 1 
rows. As A has iid Gaussian entries, the random variables di are independent. Furthermore, 
their distributions can be computed exphcitly and the theorem fohows from Lyapunov's Central 
Limit Theorem and a routine calculation. (We include the details in Appendix A for the reader's 
convenience.) 

The situation with general random matrices is considerably more complicated. In [6j, Girko claimed 
that Theorem 1 1 . 2 1 still holds if the entries are no longer Gaussian, but still iid with mean zero and 
variance one. We believe that this statement is true, but could not understand Girko's proof. On 
the other hand, it seems possible that one can give an alternative proof using recent developments 
in the field. 

In this paper, instead of limiting distribution, we focus on tail inequalities, which are usually very 
useful in probabilistic combinatorics and related fields. As an illustration, we present an application 
concerning the problem of computing the permanent using determinant estimators. A consequence 
of our main result shows that one can use a determinant estimator to estimate the permanent 
of a matrix of size n with positive entries within a sub-exponential factor exp(n^/^) with high 
probability. If Conjecture 11.41 holds, then the approximation will typically be within a polynomial 
factor n^^^\ 

To start, we note an old observation of Turan that if the entries of A are iid with mean and 
variance 1, then E(|detAp) = E(detA^) = n!. Combining this with Theorem II. 2 1 we obtain the 
following corollary. 

Corollary 1.3. Let A be the random matrix of size n whose entries are iid Gaussian with variance 
one. Then with probability tending to one 



(1) |det^2| ^n-i+°WE(det^2)_ 

We believe that a similar result holds for all random matrices having independent entries with 
mean zero and bounded variances. 

Conjecture 1.4. Let c < C be positive constants. Let A be the random matrix of size n whose 
entries are independent random variables with mean zero and variances between c and C . Then 
with probability tending to one 



(2) |detA| = n'^^^^Eldet^l, det^^ = n°^^^^{d(itA^) 

This conjecture looks highly non-trivial. As a first step, we consider the case when the entries of 
A are scaled Bernoulli random variables (namely, the ij entry takes values itcjj with probability 
half). Our experience is that this is usually the hardest case and its understanding would lead to 
the solution of the general case. Our main result is 

2 



Theorem 1.5. Let < c < C and B > be fixed. Let A be a random n x n matrix matrix whose 
entries Uij takes values ^Cij with probability 1/2, independently, where c < \cij\ < C. Then with 
probability 1 — , 

\deiA\ = exp(0(n2/3logn))E(|deU|), 

and 

det(A2) = exp(0(n2/3logn))E(det^2)_ 
Here the hidden constants in the O notation may depend on c,C,B. 

In the case c = C = 1 (i.e., the entries of A are iid Bernoulh), a better bound ex.p{0{\/ n log n) was 
recently proved in [18]. The approach in [18], however, does not extend to random matrices with 
entries having different variances. In the present approach, it seems to require some new ideas in 
order to significantly reduce the constant 2/3. 

If one assumes that the entries of A are Gaussian (with different variances cfj), then a weaker bound 
(exp(en) for any positive e) was proved by Friedland, Rider and Zeitouni [5]. Our Theorem 1 1 . 5 1 also 
holds for this case, with the same proof (see Section [8]) and thus we obtain an improvement for the 
main result of [5]. 

2. Computing permanents 

Let us now consider detM and perM from the computational point of view. It is not hard to 
compute detM. In fact, there are effective algorithms to compute the whole spectra of M. The 
problem of computing perM, on the other hand, is notoriously hard, and has been a challenge in 
theoretical computer science for many years. 

A well-known observation that relates the problem of computing the permanent to that of determi- 
nant is the following. Let Uij be independent random variables with mean zero and variance one. 
Given a matrix M with entries Ojj, define a random matrix A with entries ^JOijUij. Then, using 
linearity of expectation, it is easy to verify that 

(3) E(det^2) = per(M). 

If detA^ is strongly concentrated around its mean, then ([3]) leads to the following very simple 
algorithm: Given M, create a random sample of A. Compute det^^ and output it as an estimator 
for per A. The core of the analysis is then to bound the degree of concentration of det^^ around 
its expectation. 

We mention here that in the case when M has non- negative entries, the famous work of Jerrum and 
Sinclair [10] and Jerrum, Sinclair, Vigoda |llj gave an fully polynomial randomized approximation 
scheme for the problem, using the Markov-chain Monte Carlo approach. Theoretically, this result 
is as good as it gets. On the other hand, the determinant estimator approach is still of interest, 
thanks to its simplicity and implementability. (The Markov chain algorithm requires running time 
e(n7).) 
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In [7], Godsil and Gutman proposed setting Uij to be iid Bernoulli random variables. Following 
the literature, we call this algorithm the Godsil-Gutman estimator. This is perhaps the simplest 
estimator. On the other hand, its analysis seems non-trivial. To illustrate this, let us consider the 
case when M is the all-one matrix. Clearly perM = n\. On the other hand, it is already not easy 
to prove that with high probability detA ^ (this was first done by Komlos [E]). Effective bounds 
on Idet^l have only recently become known(see [18j). 

If one forces Uij to have a continuous distribution, the situation is more favorable. For instance, it is 
trivial that det^ 7^ with probability one. By setting mj to be iid Gaussian variables, Barvinok [4J 
showed that one can approximate the permanent of a non-negative matrix within a factor of c", for 
some constant < c < 1. A well-known problem with using Gaussian (or continuous) distribution 
is that in practice the implementation involves a truncated version of each variable. If the goal 
function (which is a function of many random variables) has a small Lipschitz coefficient, then this 
routine is effective. However, if its Lipschitz coefficient is large, then one needs to use a very fine 
approximation, and this increases the complexity of the input and would raise some challenges in 
implementation. 

It is known that if one allows the matrix to have zero entries, then determinant estimators do not 
necessarily give a good approximation to the permanent. For example, Barvinok gave an example 
where the permanent is 2*^ but the Godsil-Gutman estimator almost always returns 0, and another 
where his own estimator will almost surely perform no better than an exp(0(n)) approximation. 
On the other hand, Friedland, Rider and Zeitouni[5] showed that if the entries are strictly bounded 
from above and below by positive constants, then Barvinok estimator gives an approximation factor 
exp(en)), for any fixed e > 0. 

As a consequence of Theorem 11.51 and Theorem 14. H we obtain the following improvement 

Theorem 2.1. Let A be a (deterministic) square matrix of size n with entries between c and 
C , where c and C are positive constants. Then both the Godsil-Gutman and Barvinok estimators 
approximate perA within a factor 0/ exp(n^/^ log n) with probability tending to one. 

If Conjecture 11.41 holds, then one can improve the approximation factor to n^^^\ 

It remains a tantalizing problem to analyze the determinant estimator for the case when the entries 
of A are not non-negative real numbers. Notice that ([3]) still holds in this case, but no effective 
algorithm is known. 

3. The main ideas 

We start with the well-known identity 

n 

(4) det^2 ^ det(yl^'^) = JJ af 

1=1 

where < cJi < cj2 < • • • < cj„ are the singular values of A. 

If one could show that each singular value ai is very strongly concentrated around some non-zero 
value, then det^^ would be so as well. Unfortunately, such a result is not available. In [2], it was 
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shown, via Talagrand's inequality, that the largest singular values are strongly concentrated, but 
the degree of concentration decreases rather quickly as the index decreases. 

To overcome this obstacle, we will follow the approach in [5j, which is based on the fact that, 
roughly speaking, the counting measure generated by the ai is strongly concentrated. This fact 
was proved by Guionnet and Zeitouni in an earlier paper [8j, also using Talagrand's inequality. 
Guionnet and Zeitouni's result asserts that (after a proper normalization by a factor Xj^/n) any 
fixed interval, with high probability, contains the right number of singular values. This enables one 
to show that the product of most of the singular values is close to the expectation. 

The main technical barrier of this approach arises at the end of the spectrum. The Guionnet- 
Zeitouni result does not reveal any information about the few smallest singular values. In [5], the 
authors needed to exploit the Gaussian assumption (following an approach of Bai [3j ) in order to take 
care of these singular values. This technique, however, is not applicable for discrete distributions 
such as Bernoulli. In particular, it does not even show that a random matrix with discrete entries 
is non-singular with high probability. 

The proof of Theorem [T3] requires two new ingredients. The first is a lower bound on the smallest 
singular value ct^. In p^, it was shown, for many models of random matrices that is at least 
n~*^, for some constant C While the models in do not include the type of random matrices 
we consider here, we are able to modify the proof, without too many difficulties, to treat our case. 

To continue, naturally one would try to use the uniform bound n~*^ for all singular values which 
have not been treated by the concentration result. These will be singular values which are less by 
some threshold e(n). It is now critical to estimate the number of such singular values. The value 
of e{n) will be too small for the concentration result of Guionnet and Zeitouni to give information 
about this number. The second main ingredient of our proof is a method that provides a good 
bound. This is based on a simple, but useful, identity (discovered in [20]) which gives a relation 
between the singular values crj and the distances d,. 



We will actually prove the following more general case of Theorem 11.51 where we merely require 
the entries to be bounded and have bounded variance instead of to be Bernoulli. 

Theorem 4.1. Let K > 0, B > 0, and < c < 1 be fixed. Let A be a random n x n matrix whose 
entries Uij are random variables satisfying 



4. A MORE GENERAL THEOREM AND THE MAIN LEMMAS 



• c < Var(aij) < i 

• P{\aij - Eiaij)\ <K) = 1. 



Then with probability 1 — n 



~B 



\detA\ 



exp(0(n2/^ logn))E(|det^|) 



and 



detA^ 



exp(0(n2/3 log n))E(det^2) 



where the constant implicit in the O notation depends on K, B, and c. 
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Remark 4.2. Here and elsewhere the relation a„ = 0{hn) indicates that the ratio an/hn is bounded 
above in absolute value as n tends to infinity. In particular, the theorem above gives both an upper 
bound and a lower bound on the ratio between the determinant and its expectation. 

Remark 4.3. The uniform boundedness condition can be replaced by the condition that all of the 
entries have a Gaussian distribution; see section 8. 

Recah (g]), 

(5) {det Af = det{AA^) = J] =11'^*^ 

o-especAA^ i=i 

where < ci < (J2 < • • • < are the singular values of A. 

We will start in a similar way as in [5]. Let e be a parameter to be determined later (which 
may depend on n). We estimate ([4]) by dividing the spectrum into two parts, writing |detA| = 
deitruncAd&tsmaiiA, where 



dettrunc = I JJ max{cr,e^} 

^o-espec(AA^) 

detsmall = I W min{cje"^,l} 
i^(Tespec(ylA^) 

We show that dettruncA and deitruncA^ are strongly concentrated around their means 
Lemma 4.4. There is a constant cq > dependent only on c such that 

dettruncA = exp(0(ne~^logn))E(dett.™„c-4) 

and 

dettruncA^ = exp(0(ne~^logn))E(dettr„„c^^) 
with probability 1 - 0(n~'="^°S"). 

The proof is presented in Section 3. 
To handle detsmaiiA, notice that 

(6) 1 > detwz^ > min{l, K(A)e-i)*^(^)} 

where an{A) is the smallest singular value of A and Se{A) denotes the number of singular values of 
A which are at most e. We can therefore bound detsmaii^ from below by using the following two 
lemmas. 

Lemma 4.5. For any B > 0, 

P{an{A) < n-^B-^) < n-^ 
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We remark that —AB — 7 is pretty far from being optimal and can be improved, but doing so would 
not affect our final results in any essential way. This lemma is a variant of many results proved in 
[19j (see also [17J). However, [19] required that the distributions of the entries oi A to be dominated 
in a certain Fourier analytic sense by a single common distribution. Our matrices do not satisfy 
this assumption. However, we are able to modify the proof, without too many difficulties, to obtain 
the desired result. 

Lemma 4.6. Let r > log^ n, and assume c < Var(aij) < i. Then 

The above two lemmas combine to show that no singular value of A is likely to be so small as to 
have too large an effect on the determinant, and, furthermore, we can also deduce that not many 
singular values will have to be handled by detsmaii- 

Let us for now assume the previous two lemmas to be true. By taking r = ^^3^ in Lemma 14.61 we 
see that with high probability s^{A) = 0(n^/^e). Combining this with Lemma 14.61 and the bounds 
in ([6]), we see that for any > we have with probability 1 — n~^~^°^^^ that 

detsmaii A = exp(0(n^/2elogn)), 
which therefore implies that with the same probability 

(7) dettruncA > |detA| > exp(0(n^/^e log n))dettr«nc-4. 

(note again that the use of O in the lower bound here indicates an exponent bounded in magnitude.) 
Now let us fix e = n^/^. Combining the second half of the above inequality with Lemma 14.41 we 
see that with probability 1 — n~^^°^^^ we have 

Idet^l > exp(0(n2/3logn))E(dett^„„c^). 

Taking expectations, we find that 

(8) ¥^{dettruncA) > E(|det^|) > (1 + o(l)) exp(0(n2/3 logn))E(dett™„e^). 

The first half of Theorem 14.11 follows from combining ([7]) , ([8]) , and Lemma 14.41 The second half 
follows from the identical argument being applied to detA"^. 



5. The Proof of Lemma [131 



As in [5], we begin with the spectral concentration results of Guionnet and Zeitouni, in particular 
the following special case of Corollary 1.8(a) in [8]: 

Theorem 5.1. Let Y be an n x n matrix whose entries are independent random variables each 
having support on a compact set of diameter at most K , and let Z = Y^'^Y. Let Ai, . . . , A„ be the 
eigenvalues of Z, and let f be an increasing, convex, function such that g{x) = /(x^) has Lipschitz 
norm \g\L- Then for any 6 > do := ^I^^^^MIl^ 

P(| /(A.) - E(f; /(A.))| > 26n) < 4exp (Jl-^) . 
,-1 ,-1 \ ^ \9\l / 



Ideally, we would like to apply this theorem with / taken to be the logarithm, so that Yl fi^i) — 
log detA^. The difficulty is that the logarithm is not Lipschitz. To overcome this problem, we follow 
\^ and truncate the logarithm. Write 

log*^ X = max{2 log e, log x}, 

where log'^(O) is defined to be 21oge. Note that we have 

(9) log(dett™„c^) = ^ Yl log'(^)- 

aespeciAAT) 

Although log''(a;^) now has finite Lipschitz constant ^ (this was the purpose of truncating the 
logarithm), it is not convex. However, it can easily be written as the difference of two convex 
Lipschitz functions, so the above theorem applies, and we have for some absolute constants Co and 
Ci and any 6 >= Sq := C^e^^ /n that 



(10) P(| log{dettruncA) - E{log{dettruncA))\ > 6n) < 4exp(-Ci— — : 

lb 

log n 



Taking 6 = we see that for some constant cq we have 
(11) P(| log{dettruncA) - E(log(dett,„„c^))| > = 0(n 



nlogn ^,„_coiognx 



This would be exactly the result we wanted, if only the expectation and the logarithm were switched 
on the left hand side of (jlOp . Following [5j, we now write 

U{A) = log(dett 

rune 

A) - E(log(detf 

rune 

A)). 

We know E(C/) = 0, and by Jensen's inequality we have 

1 < E(e^) < E(el^l) < 1 + / e'P{\U\ > t)dt. 

Jo 

It follows from the above and (|lUp that 

1 < E(e^) = ^^^^^"---^^ < expiOi^)) 
- ^ > gE{log{dett,„„,A)) - '■^ ^e^" 

and the first half of Lemma 14.41 follows by taking logarithms and combining with (llip . The second 
half follows from the identical calculation applied to e^^. 

Remark 5.2. If we only had required that the truncated determinant concentrate somewhere, the 
argument above would have given a stronger bound (roughly exp(e~^n^/^ log n)). The dominant 
term in our bound came from showing that the "somewhere" was close to the actual expectation. 

Also, we did not at any point use our lower bound on the variance of the entries. In particular, 
this truncated determinant will be concentrated around its expectation even if we allow most of 
the entries of A to be non-random. However, it is not true in general that deltruneA will be close 
to det^. 



6. The Proof of Lemma [Q] 



We begin by first reducing from the general case back to the case of Bernouhi Matrices. To do so, 
we will use the idea of Bernoulli decomposition from a paper of Aizenman et. al. [IJ. In this paper, 
it is shown that for any random variable X that is nondegenerate (not taking on any single value 
with probability 1), we can find a p £ (0, 1) and functions f{t) and g{t) such that 

• If t is uniform on [0, 1], and e is a Bernoulli variable independently equal to 1 (with prob- 
ability p) or (with probability 1 — p), then f{t) + g{t)e has the same distribution as 
X 

• inf g{t) > 0. 

Recall that we are assuming that our entries are both uniformly bounded in magnitude by K and 
bounded below in variance by c. It follows from the methods of [1] (see Remark 2.1(i) there), that 
in this case we can find a Bernoulli composition aij = fij{tij) + g{tij)eij of every entry of A in which 
the g{tij) have a uniform lower bound (3 = (3{K, c) for all values of i,j, and t, and for which the pij 
in the decompositions are uniformly bounded away from and 1. 

We now view our matrix as being formed in two steps. First, we expose tij for each entry. At this 
point every entry can be viewed as having a shifted Bernoulli distribution. Next we expose the Cij. 
It follows by taking expectations over all possible values of tij that it suffices to show the following 

Lemma 6.1. Let < q < ^ and B,C,c > be fixed. Let A be a matrix whose entries are 
independent random variables distributed as a^j = mij + eijUij, where \mij\ < n^/^ and c < riij < C, 
and furthermore the Cij satisfy 

q < P(e,j = !) = !- P(e,j = -1) < 1 - g. 

Then for sufficiently large n we have 

P{an{A) < < n-^. 

Remark 6.2. The form of this theorem is very similar to that of the smoothed analysis of the 
smallest singular value in [I9j . The key difference here is that we no longer require the Uij to be 
identical. 

Proving Lemma 16.11 is equivalent to bounding the probability that for some unit vector v we have 
\\Av\\ < n~^^~'^. We will do this by dividing the vectors into two classes, which should be thought 
of as "structured" and "unstructured", for an appropriate definition of "structured" depending 
both on A and on B. 

Definition 6.3. A vector v is rich if there is some i for which 

n 

snpPiiy^aijv, -z\< n-4^"^3/2) > ^-B-i 

Otherwise v is poor. 

Equivalently, a poor vector is one for which no individual coordinate of Av is too concentrated. 
Lemma 14.51 would be an immediate consequence of the following two lemmas. 
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Lemma 6.4. 

P(||^^^|| < n~'^^~'^ for some poor v) < 2^^^ 



Lemma 6.5. 



Proof of Lemma 16.41 



Pdl^fll < n for some rich v) < —n ^ 



We adapt an argument from [13| (see also \T9\). Let E be the event that for some poor unit vector 
V we have \\Av\\ < If E holds, then the least singular value of A is at most n~^^~'^, so 

the same must hold for A^. For 1 < j < n, let Fj be the event that there exists a unit vector 
w = {wi, . . . Wn)'^ which simultaneously satisfies 

\\w'^A\ \ < n~'^^-\ \wj\ > ^ 



In 

Since every w has at least one coordinate at least n~^/^ in magnitude, we have 

n 

v{E)<Y,'P{E^F,). 

i=l 

Now let j be fixed. Let Ai, . . . An be the rows of A. We will condition on all of the rows except 
row j. li E is to hold, there must be a poor v such that 

(X^|A,-r;|2)V2 = ||yi^|| <„-4i?-7_ 

i=l 

It follows that if P{E\Ai, . . . Aj^i, Aj^i, . . . An) is non-zero, then there is a poor u such that 
(12) (^|yli-n|2)V2 <^-4i?-7 

Conversely, by our assumptions on w we have that if Fj holds, then 

\\^mA.\\<n-'^-\ 

Taking inner products with u and using the triangle inequality, we conclude 

liOj I • u\ < • u\ + n~^"^~'^ 

Combining the above with (jl2p . the Cauchy-Schwartz inequality, and our assumption on \wj\, we 
obtain that if both E and Fj hold, then 

On the other hand, since u is poor and Aj and u are independent, we have that 

P{\A, ■ u\ < 2n-4^+i3/2|^^^ ^ ^ ^ Aj_i,A,+i, . . . A„) < n'^'^ 

Combining the above, we see that 

F{E AFj\Ai...Aj^i,Aj+i,...An) < n-^-\ 

regardless of our choice of the remaining n — 1 rows. It follows that P(^ A Fj) < n'^'^ for every 
J, and therefore that P(-E) < . 
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Proof of Lemma 16.51 Let J be the unique integer satisfying 2B + 2 < J < 2B + 3 and let 
6 = (B + 1)/J. Let 7 > be a constant chosen to be sufficiently small that 6 + 3'y < 1/2 and 
(4S + 7)7 < i. Finally, we let D = 2 + 27. 

Let f be a rich unit vector. We define 

gij) := supP{\Afv -z\< n-4S-i3/2+D,) 

i.z 

Clearly < g{j) < 1, and g{j) is an increasing function in j. The assumption that v is rich is 
equivalent to the statement that g{0) > n^^^^. It follows from the pigeonhole principle that for 
some < J < J — 1 we have 

For < J < J — 1 and I < k < \ ^^^^^ ~\ , we define to be the collection of rich v satisfying both 

9ij + l)<n'gij) and ^(j) e [n''^^ n-('=-^)^] 

Since every rich v is contained in some ^Ij^k, and there are only a bounded number of pairs (j, k), 
it suffices to prove that for every fixed j and k we have 

(13) Pdl^^^ll < n-^^~^for some v G %,fc) = o{n-^). 

Our goal will now be to construct a /?— net for each fij fc, that is a set Vq such that any point in 
flj^k is within (Euclidean) distance /? of some point in Vq. Assuming that for sufficiently small f3 
the net is not too large, we will then be able to obtain 16.51 by a union bound. We begin bounding 
the size of the net with the following result, a special case of [IT, Thm. 3.2, see also Remark 2.8]: 

Theorem 6.6. Let < q < ^ and let xi, . . . Xn be independent random variables taking on values 
in {1,-1} and satisfying 

q < P{xj = 1) <l-q. 

Let < 6 < 1 be fixed, and let p and j3 be chosen to satisfy p = n^^^^^ and j3 > exp(— n"''/^). Then 
the set of vectors {vi, . . . Vn) satisfying 

n 

sup P(| ^ ViXi - z\ < [3) <p 

z&C i=i 

has a P-net in the loo norm of size at most n^O-/'^+^)'^p~^ _)_ exp(o(n)). 
For any particular i, we have 

n n 

(14) P(| ^(mij + nijeij)vj - z\ < p) = P(| ^ eijVj -z\< p, 

where {fj = vjUij and z = z — rriijVj. For any particular coordinate of Av, Theorem 16.61 gives 
an upper bound on the minimal size of a /?— net for the set of v for which the right hand side 
of (1141) holds with probability at least p. By taking an affine transformation Vj ^ ^ of the 

case P = n"^^^^^/^^^-' , p = n~^'~' of this net and taking the union of the resulting net for each 
coordinate, we obtain the following modified version of Theorem 16.61 

Lemma 6.7. Let xi, . . .Xn be independent and have the form Xi = mi + eirii, where the m, n, e are 
as in Lemma \6.1\ Let < (5 < 1 6e fixed. Then ^j^k has an " '^^ 13/2+Dj ^^^^ norm of 

size at most n^-{'^/'^+^)'^n^i'^ -\- exp(o(n)). 
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Let Vb be a net guaranteed by the above lemma, and consider any v' G Vq and v E Qj^k such that 
1 1 — I loo < P- Our bounds on the rriij and riij guarantee that (assuming n to be sufficiently large) 
the spectral norm of A satisfies crn{A) < n. Since 

\\Av'\\ < \\Av\\ + \\A{v-v')\\ < \\Av\\+n^/^cjn{A)\\v -v'Woo, 
it follows that if \ \Av\ \ < n'^^-'^ then 

ll^^'ll <(l + l)„-4S-4+D,; 

It follows that there must be at least n — n^~'' rows of A for which 
(15) lAjv'l < n-4^-9/2+^^-+7. 



On the other hand, we also have for any i for which (jl5p holds that 

n 

\AJv\ < {Xfv'l + \\v - v'Woo y^^jruij + Uij) 

< ^-4B-9/2+Di+7 + l„-4B~9/2+Di(l ^ ^1/8) 
C C 

Where the last inequality comes from our definition of D. 
It follows that 

Pi\A[v'\ < n-4^-9/2+^^-+7) < P(\A[v\ < n-4^-^+^(-'-+i)) 

< n^gU) 

where for the last two inequalities we use the definition of ^Ij^k- 

This will be sufficient to handle the case where k is sufficiently large. For smaller k, we note that 
by our choice of 7 and D we have —4B — 9/2 + Dj + 7 < —1, so 

F{\Afv'\ < n-4^-9/2+^^-+7) < P(\Ajv' < -\), 

n 

which can easily be checked to be at most 1 — q. 
Therefore 

P{\Afv'\ < j^-4B-9/2+Di+7)) < niin(n^+^-'=% 1 - q). 
Taking the union bound over all sets of n — n^~^ rows, we see that 

P(| lAv'W < n-4-B-4+i3i+7) < niin(n^+^-'=% 1 - g)"' ( ^ 

\n — 

for any particular v' in our net for ^Ij^k- Taking the union bound over the entire net, we obtain 
that the probability that the left hand side of (jl3p holds is at most 

,1-7/ n 



(^-(i/2+5)n+i^fc7n + exp(o(n))) min(n'^+^-'=% 1 - q)' 



which can be verified to be exponentially small by a routine calculation. 
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7. The Proof of Lemma [Ol 



As in the proof of Lemma 14.51 it suffices by Bernoulli decomposition to prove the following special 
case of this lemma: 

Lemma 7.1. Let < q < ^ and B,C,c > be fixed. Let A be a matrix whose entries are 
independent random variables having distributed as Oij = mij + CijUij, where \mij\ < n^/^ and 
c < Uij < C , and furthermore the eij satisfy 

q < P(e,j = 1) = 1 - P(e,j = -1) < I - q. 

Then for r > log^ n, 

To prove this Lemma we are going to use the following lemma from |20] 

Lemma 7.2. [20] Let M be an m x n matrix (m < n). Let di be the distance from its ith row 
vector to the space spanned by the first i — 1 rows and at be its singular values. Then 

m m 
i=l i=l 

Recall that log^n < r < ^. By the interlacing inequalities for singular values (see, for example, 
Theorem 7.3.9 in [9j), we have that 

a2M) > ar{A'), 

where A' is the matrix formed by removing the last r columns from A. 
To bound the right hand side of this equation, we note that 

k=l 
k=l 

(16) = JEC^ 

i=l 

where di denotes the distance from the i^^ column of A' to the span of the remaining columns, and 
for the last equality we use Lemma |7.2[ 

Informally, this states that if a matrix has many small singular values, it must have many columns 
which are very close to the subspace spanned by the other columns. Since r is becoming increasingly 
large, the co-dimension of this subspace is increasing as well, so this should become unlikely. 

Now let i be fixed. To bound the probability that di is small, we first expose the subspace Si 
spanned by the remaining n — r — 1 columns, then finally the remaining column. Let P denote the 
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projection matrix onto that subspace, and let pij be the entries of P. Let Xi = {an, . . . am) be this 
final column. We have 



n4\Si) = E(|X,|2)-E(|PX,|2) 



j=l k=l 1=1 



n 



^afj{l-pki) 



n 

= c^(n — Tr{P)) = (?r 

as the terms with k 7^ I cancel by the independence of the entries of A, and we again use how the 
entries of A are bounded away from 0. 

Remark 7.3. If the entries of A were to have equal variance c, then the inequality here would 
actually be an equality, and the expected square distance would be independent of 5. This is 
what enables the arguments of [6l [18] (which are based on row by row exposure of the matrix in 
question), and why those arguments don't carry over here to give an immediate estimate on the 
determinant of A. 



In other words, a random vector from our distribution will on average be far away from any fixed 
n — r — 1 dimensional subspace. It remains to show that it will typically be far away. 

It follows from Talagrand's inequality [T5] and our bounds on the riij that if Mi^s is the median 
value of di conditioned on Si , then 



Pi\di-Mi^s\ >t\Si)<Aexp{- 



By an argument identical to that in [181, it can be shown that |Mj — -^J E[df)\ < It therefore 
follows that for sufficiently large r, 

< < 4exp(-^^) = o(n-i°g"-i), 

as by assumption r > log(n)^. 

In particular, with probability 1 — o(n~'°s") every di will be at least c^r/2. Combining this with 
(fT6l) . we see 

Pia2riA)<-^) < PiaAA'y'y^^^) 
— r c^r'^ 



i=l 
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8. Concentration of Determinants for Gaussian variables 



Although we have focused mainly on the concentration of determinants for the case of matrices with 
uniformly bounded entries, our main results also hold in the case where every entry has a Gaussian 
distribution, assuming that the means of the entries are uniformly bounded and the variances of 
the entries are uniformly bounded above and below. In particular, this implies that our bounds 
hold for Barvinok's as well as Godsil and Gutman's estimator for the permanent. 

The proof of Lemma 14.41 is exactly the same as before, except that we use Corollary 1.8b of [8] 
instead of Corollary 1.8a. For the remaining two lemmas, we again use the idea of Bernoulli 
decomposition. It can be explicitly checked that if X is a Gaussian variable satisfying |E(X)| < 
c < Var(X) < C, then X can be decomposed as 



where t is uniform on [0, 1], e is uniform on { — 1, 1}. Furthermore, we can do so in such a way that 
g{t) is bounded uniformly from below, and the measure of the set of t for which + < log^ n 



We now expose Uj for each entry of A. At this point every entry will be a Bernoulli distribution 
and (except for an exceptional set of probability o{n~^) for any B) the mean and variance of the 
entries will be bounded by log^ n. Lemmas 14.51 and 14.61 now follow as before from Lemmas 16.11 and 
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Appendix A. Log-Normality of the Determinant of Gaussian Matrices 



In this appendix we will show that the determinant of a matrix whose entries are iid standard 
Gaussian random variables has the distribution given by Theorem 11.21 Our starting point is the 
formula 

n 

(18) |detA|=JJdi, 

1=1 

where di is the distance from the i*'^ row of A to the subspace spanned by the previous i — 1 rows. 

This formula is particularly useful for Gaussian vectors due to their rotational invariance: If x is a 
random vector whose coordinates are iid Gaussian variables having mean zero, then distribution of 
the distance from x to a, fixed subspace 5 is dependent only on the dimension of S. If the dimension 
is n — fc, then the distribution of the square of the distance follows a chi-square distribution with 
k degrees of freedom. In particular, this implies that the distribution of the determinant is the 
same as that where we treat each of the variables in (|18p as being independent and following a chi 
distribution. We will do so for the remainder of this appendix. 

Taking logarithms in (jlSp and rearranging, we see that 

21og(|detA|) - logn! = ^log(^) = ^log(l + 

i=l i=l 



Following the ideas of [B], we next perform a Taylor expansion on the right hand side, writing 

^^g^ 21og(|detA|)-logn! _ Er=i^ l Er=i(^)^ , l Er=i(^)=^ , E■=l^^ _ 
\/2 log n ^2 log n 2 ^2 log n 3 ^/2Togn -y/2 log n 

We examine the terms in order. 

It follows from standard facts about the chi-square distribution that has mean 0, variance |, 
and fourth moment 1^^+^^ , It follows immediately from Lyapunov's Central Limit Theorem that 
the first term on the right hand side of ()19p converges weakly to A^(0, 1). 

For the second term, we observe has mean | and variance ^^^r^- In particular, the variance 

of X](~V~)^ o(logn). It follows that the second term converges to ■^==. Similarly, it follows 

from the moments of the chi-square distribution that the expectation and the variance of E(-^^)'^ 
are 0(1) = o(log(n)), so the third term converges weakly to zero. 

The final term is slightly more complicated due to the singularity of the logarithm at 0. We first 
split the error term as ej = + , where is zero whenever < | , and is zero whenever dj > | . 
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We will show separately that each of the contribution of each of these errors (converges weakly to 
zero). 

For the first error term (the case where di is large), we note that |e^| = 0{\df — and therefore 

(using the fourth moment given above) E(|e-|) = 0(1). It follows that ^^^^^ converges to zero, so 
the first part of our decomposition is negligible. 

It can be checked by direct computation that E(| \og{df)\) is finite for any i. The same therefore 
also holds for E(|e^'|), so it follows that for some function s = s{n) diverging to infinity sufficiently 
slowly we have 

(20) ^^=1 "^"^ 

From the fourth moment given above we know that P((ii < |) = 0{^). Since s diverges to infinity, 
it follows immediately that e" is almost surely zero. Combining this with (j20p . we see that 

the e" portion of our truncation error is also negligible. 

The theorem follows from our bounds on each term in the Taylor expansion. 
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